Weighting Distributional Features for Automatic Semantic Classification of Words
نویسندگان
چکیده
The paper is concerned with weighting distributional features of words with the aim of improving their automatic semantic classification, a task relevant to a number of NLP applications such as lexicon acquisition or named entity recognition. The purpose of the paper is to bring attention to differences between two major weighting strategies: Discriminative Feature Weighting and Characteristic Feature Weighting. The comparative study includes three popular discriminative weighting functions (Mutual Information, Information Gain, and Gain Ratio), and three characteristic weighting functions (Term Strength, and the two newly introduced Local Term Strength and Confidence). We find that the two strategies, on the one hand, are characterized by their own optimal settings, and, on the other hand, similarly interact with the parameter optimization of the learning algorithm.
منابع مشابه
An Improvement in Support Vector Machines Algorithm with Imperialism Competitive Algorithm for Text Documents Classification
Due to the exponential growth of electronic texts, their organization and management requires a tool to provide information and data in search of users in the shortest possible time. Thus, classification methods have become very important in recent years. In natural language processing and especially text processing, one of the most basic tasks is automatic text classification. Moreover, text ...
متن کاملThe Distribution of Mood An Exploration of Distributional Compositions in Sentiment Classification
Distributional semantics is a research area investigating unsupervised datadriven models for quantifying semantic relatedness. This thesis investigates the possibilities of using distributional semantic models for sentiment classification of utterances, by composing distributional vectors of words in utterances. For evaluation I use a set of manually classified movie reviews. While the purpose ...
متن کاملA Supervised Learning Approach to Automatic Synonym Identification Based on Distributional Features
Distributional similarity has been widely used to capture the semantic relatedness of words in many NLP tasks. However, various parameters such as similarity measures must be handtuned to make it work effectively. Instead, we propose a novel approach to synonym identification based on supervised learning and distributional features, which correspond to the commonality of individual context type...
متن کاملWord classification based on combined measures of distributional and semantic similarity
The paper addresses the problem of automatic enrichment of a thesaurus by classifying new words into its classes. The proposed classification method makes use of both the distributional data about a new word and the strength of the semantic relatedness of its target class to other likely candidate classes.
متن کاملArticles: Bootstrapping Distributional Feature Vector Quality
This article presents a novel bootstrapping approach for improving the quality of feature vector weighting in distributional word similarity. The method was motivated by attempts to utilize distributional similarity for identifying the concrete semantic relationship of lexical entailment. Our analysis revealed that a major reason for the rather loose semantic similarity obtained by distribution...
متن کامل